NEMO5: Achieving High-end Internode Communication for Performance Projection Beyond Moore's Law

نویسندگان

Robert Andrawis

José David Bermeo

James Charles

Jianbin Fang

Jim Fonseca

Yu He

Gerhard Klimeck

Zhengping Jiang

Tillmann Kubis

Daniel F. Mejia

Daniel Lemus

Michael Povolotskyi

Santiago A. Pérez-Rubiano

Prasad Sarangapani

Lang Zeng

چکیده

Electronic performance predictions of modern nanotransistors require nonequilibrium Green’s functions including incoherent scattering on phonons as well as inclusion of random alloy disorder and surface roughness effects. The solution of all these effects is numerically extremely expensive and has to be done on the world’s largest supercomputers due to the large memory requirement and the high performance demands on the communication network between the compute nodes. In this work, it is shown that NEMO5 covers all required physical effects and their combination. Furthermore, it is also shown that NEMO5’s implementation of the algorithm scales very well up to about 178176CPUs with a sustained performance of about 857 TFLOPS. Therefore, NEMO5 is ready to simulate future nanotransistors. Overview of the problem and its importance State of the art and future semiconductor logic devices are in the nanometer length scale. The thickness of ultrathin body (UTB) transistors in the year 2020 is predicted to be around 3nm [1]. Given that a typical distance between two semiconductor atoms is about 0.1nm, the number of atoms is countable. Atom sized device features cannot establish material properties that require a large number of atoms. Therefore, the prediction of future device properties cannot rely on material properties. Instead, detailed calculations of the concrete device structure in subatomic resolution is vital: Device defects and imperfections, such as alloy and dopant atom distributions or roughness fluctuations influence the device performance. For any finite temperature, device atoms vibrate around their ideal lattice position, i.e. they support phonons. Device electrons scatter on these phonons incoherently: This randomizes electron energy and momentum. Such scattering is an important contribution to the device resistance and has to be included in reliable performance predictions as well. These incoherent effects can counterbalance or enhance the coherent nanoscale effects such as tunneling and confinement. For instance, random device fluctuations can confine (localize) electrons in some device sections and delocalize in others. To reliably predict nanoscale device performance requires a consistent treatment of all above effects. This work considers electronic transport in ultra-thin double gated nanotransistors (Fig. 1). The device is Figure 1: Atomically resolved UTB device considered in this work. The confinement direction is 3nm long only. Surface roughness is shown in the inset. Figure 2: Current voltage characteristics of the considered UTB device with (red) and without (back) incoherent scattering on optical and acoustic phonons. 28 nm long in its transport direction and has a body thickness of 3nm. The device is assumed to be periodic in the in-plane direction perpendicular to transport. Devices with any kind of random disorder are strictly speaking non periodic. Therefore, the simulated devices extend in the periodic direction beyond the minimum ideal unit cell. In fact the simulated device is as large as possible in the periodic direction to avoid artificial effects due to assumed periodicity. All nanodevices in this work are atomically resolved typically with about 24000 atoms. The structures in this work are all Silicon alloyed with 10% Germanium. This constellation can be modeled with two approaches: The first approach, i.e. the so-called “virtual crystal approximation” (VCA) [2] idealizes a fictitious atom type that has 90% Si and 10% Ge properties. The ideal approach considers Si and Ge atoms explicitly. The Ge atoms are randomly distributed among the simulation domain. The random nature of this method requires solving all observables for many (>100) different Ge distributions (samples) and results are averaged afterwards. Electrons are represented in the sp3d5s* empirical tight binding method, i.e. each atom hosts 10 orbitals [3] which add up to a matrix rank of 240,000. The simulations include scattering on all relevant phonons (inelastic scattering on optical and elastic scattering on acoustic phonons) as well as charge self-consistent solutions of the Poisson equation. NEMO5 covers all these features, but their solution is notoriously expensive and requires massively parallel computer systems (as discussed in later sections). This work shows NEMO5 supports all required physics and scales very well for these type of computational problems. Figure 2 illustrates NEMO5 covers the impact of phonon scattering on the device current-voltage characteristics a comparison of the ballistic current vs. applied gate voltage to the scattered result including acoustic and optical phonon scattering. Reference [17] shows NEMO5’s unique and most recent method to model randomness in the device and in the leads. Quantitative discussion of current state of the art for science and performance It is widely accepted that the nonequilibrium Green’s function method (NEGF) consistently describes coherent and incoherent transport. Small size effects, such as tunneling, confinement and interferences as well as any sort of incoherent scattering are treated on an equal footing in the NEGF method. However, the NEGF method is numerically very expensive: It requires the solution of 4 nonlinear partial differential equations that are mutually coupled. The mutual dependence of these equations requires to keep solutions of individual equations in memory. Given the large number of variables of each NEGF equation (atomic resolution typically requires about a quarter million variables for electron transport in this work) and given that each equation has to be solved for a large number of parameters (energy and momentum give about 16.000 energy and momentum tuple) and many voltage and randomness configurations, large, massively parallel compute clusters are inevitable for these simulation tasks. In addition, the prediction of the electron density and its distribution in the nanodevice requires to couple the NEGF method charge self-consistently with the Poisson equation. To predict currentvoltage characteristics, the solution of the NEGF and Poisson equations have to be repeated for each voltage-boundary configuration. When randomness is present in the device, such as in the case of disordered alloys or rough interfaces, NEGF/Poisson calculations for many randomness samples are needed to increase the reliability of the observations. The NEGF method had been applied on a great variety of transport problems, ranging from phononic [4] to electronic transport [5, 6], covering metals [7, 8], semiconductors [9] as well nanotubes [10, 11] and fullerenes [12] or even (organic) molecules [13,14]. The complexity of the NEGF method often motivates approximations such as nanometer-only resolutions, neglect of incoherent scattering, assumption of ideal device fabrication, etc. Due to the immense numerical load, calculations of ITRS relevant, concrete devices in atomic resolution including incoherent scattering are very rare compared to the abundance of NEGF publications. The references [15,16] are such exceptions and are all based on earlier incarnations of the NEMO-NanoElectronic MOdeling tool. NEMO5, the latest version of the NEMO tools had been designed to support all possible nanodevice simulation needs including and exceeding those functionalities of predecessor NEMO versions. NEMO5 contains an important addition to the modeling capabilities of the NEMO tools: It allows for modeling of non-ideal leads which is vital for reliable assessments of the impact of randomness on the device performance [17]. Compared to earlier NEMO versions, NEMO5 handles the NEGF equations numerically more efficiently: By exploiting analytical dependencies of the Green’s functions and self-energies, i.e. all solution functions of the NEGF equations, NEMO5 can systematically avoid about 50% of the calculations of earlier NEMO versions while still preserving the full accuracy of all NEGF equations. While this improvement saves a lot CPU hours, it simultaneously reduces the ratio of computation vs. communication. NEMO5 also allows computational support by coprocessors (such as Intel Xeon Phi) and GPUs. NEMO5 is academic-open source and used among many groups in academia and industry. Claims made for innovation and its implementation This work is an important milestone to reliably assess the relevance of chip-fabrication typical alloy disorder and surface roughnesses for the performance of next transistor generations. In contrast to typical studies, this work combines the randomness with incoherent scattering on all relevant phonons. This way, the balance of coherent and incoherent effects in the presence of randomness is assessable. A conclusive assessment of this balance with a statistics of 100 random samples for one concrete UTB transistor requires about 20 million CPU hours (with typically 100 iterations of the NEGF and Poisson equations per bias point) on at least several thousand nodes due to the high memory usage. Since this numerical load exceeds the computational infrastructure currently available to the project’s team, this work focuses on important algorithmic improvements and efficient numerical implementations in NEMO5. This work shows that 1) all involved physics are covered and 2) the algorithm implementation of NEMO5 utilizes supercomputing performance to a very high level of efficiency. All electronic transport properties are solved with the NEGF method. In the stationary limit, the NEGF equations are solved with the electronic “lesser than” and retarded Green’s functions, G and G, respectively. Their differential equations read in matrix form G(k, E) = (E − H(k) − eΦ − Σ(k, E)], (1) G(k, E) = G(k, E)Σ(k, E)G(k, E). (2) All Green’s functions and scattering self-energies Σ and Σ are functions of the electronic energy E and transverse momentum k. Scattering is represented with acoustic and optical deformation potential phonons. In this case, the total selfenergies read Σ(k, E) = Σlead(k, E) + Σ R,< optical(k, E) + Σacoustic(k, E), (3) Σacoustic(k, E) = DkBT 2hωDρv 2A ∫ dk′ diagG(k′, E), (4) Σoptical(k, E) = hΞ2 8πω0A ∫ dk′ [(1 + n0) diagG (k′, E + hω0/2π) + n0 diagG (k′, E − hω0/2π)], (5) Σoptical(k, E) = hΞ2 8πω0A ∫ dk ′[(1 + n0) diagG (k′, E − hω0/2π) + n0 diagG (k′, E + hω0/2π),+0.5 diagG (k′, E − hω0/2π) − 0.5 diagG(k′, E + hω0/2π)] (6) It is worth to mention Eq.(6) contains in its exact representation a principal value integral that is usually ignored due to its small contribution to the resonant energies only [15,18,19]. Here, the equations are given in their actual implemented shape. All integrals run over all electronic momenta in the first Brillouin zone. In the Eqs.(4)-(6), the deformation potential of acoustic phonons D, the sound velocity v, the Debye frequency ωD, the optical phonon frequency ω , the lattice constant of Silicon a, and the deformation potential of optical phonons Ξare taken from experimental publications [20]. The Planck constant h, the Boltzmann constant kBand the area covered by a single atom perpendicular to the periodic device direction A are given by nature. The temperature T agrees with room temperature throughout this work. The selfenergies Σ lead in Eq.(3) describe the coupling of device electrons with the charge reservoirs via 2 semi-infinite leads. In the case of a VCA description of the device alloy, the leads are solved with the transfer matrix method of Ref. [21], whereas in the case of discrete random Ge distributions, the leads include the randomness as well and are therefore solved with NEMO5’s adaption of the complex absorbing potential method [17]. Due to their mutual dependence, the Eqs.(1) (6) are solved iteratively. Once converged, observables such as density n and charge current density j can be solved n = ∫ dE n(E) = 1 (2π)2 ∫ dE ∫ dk Im[diagG(k, E)], (7)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Memristor-based spiking neural network: coding and architecture

The limitations imposed by power consumption (130 W, reached in 2004), the end of Dennard scaling theory (describing consistent improvements in transistor density, cost performance and power) and high variability in nanoscale technology are very significant problems with respect to Moore's Law. Even if few concepts in our time have had as much influence on the economy in the last 50 years, it i...

متن کامل

Transistor architecture: 45nm and beyond

1. Introduction The global need for high performance and low power computing continues to be a major driver of the semiconductor industry. In the high performance computing segment, complex projects (such as medical imaging, genomics research and weather prediction) need significant performance increases to fulfill growing expectations. The core computing segment requires performance increases ...

متن کامل

Modeling Techniques for Strained CMOS Technology

Downscaling of MOSFETs as institutionalized by Moore's law is successfully continuing because of innovative changes in the technological processes and the introduction of new materials. The 32nm MOSFET process technology recently developed by Intel [1] involves new hafnium-based high-k dielectric/metal gates and represents a major change in the technological process since the invention of MOSFE...

متن کامل

The Need for an R&D and Upgrade Program for CMS Software and Computing

Executive Summary Over the next ten years, the physics reach of the Large Hadron Collider (LHC) at the Euro-pean Organization for Nuclear Research (CERN) will be greatly extended through increases in the instantaneous luminosity of the accelerator and large increases in the amount of collected data. Due to changes in the way Moore's Law computing performance gains have been realized in the past...

متن کامل

Review of Free-Space Optical Communications Links

Increased needs for bandwidth require technology that goes beyond traditional copper lines. As internet traffic scales faster than Moore’s law, which is the exponential growth in technology, and digging to install lines in metropolitan areas becomes infeasible, the industry has realized the benefits of free-space optical communication. Free-space optical (FSO) communication links are attractive...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1510.04686 شماره

صفحات -

تاریخ انتشار 2015

NEMO5: Achieving High-end Internode Communication for Performance Projection Beyond Moore's Law

نویسندگان

چکیده

منابع مشابه

Memristor-based spiking neural network: coding and architecture

Transistor architecture: 45nm and beyond

Modeling Techniques for Strained CMOS Technology

The Need for an R&D and Upgrade Program for CMS Software and Computing

Review of Free-Space Optical Communications Links

عنوان ژورنال:

اشتراک گذاری